Detecting CD8 Tpex with ProjecTILs

In this tutorial, we will present how to use ProjecTILs to detect Tpex in public datasets that were not detecting them. This CD8 human TILs reference was built using mention Nick Borcherding collection of single-cell datasets from tumor-patients. The map consists of 11,021 high-quality single-cell transcriptomes from 20 samples covering 7 tumor types.

The process and code to build the map can be found in this Github repo.

Briefly, this reference was built using highly curated CD8+ T cells from 20 tumor-infiltrating patients, and integrated using semi-supervised STACAS. Unsupervised clustered was performed on this integrated samples. Clusters were manually annotated with respect to classical immunology markers, with a special interest in detecting Tpex. Clusters were later downsampled to have at most 2000 cells per cluster. In our experience, this optimized cluster balance improve projection, as reported here and here. Finally, this map was converted as a ProjecTILs reference. Please note that, as this reference was constructed with tumor-infiltrating samples, it might not work perfectly when mapping other tissues, such as blood or DLN.

Why doing projection?

Projection allow to classify cells using a well curated ProjecTILs reference map.

  • This method has the benefit of using the same cell types across project, which is highly beneficial when analyzing huge collections of datasets.

  • This reference is also a way to annotate datasets only by stable cell types, and not transient cell states, like cell cycle or activation.

  • Results will be comparable across conditions, as there is little manual tuning. For instance, there is no need for steps which have a high impact, such as highly variable gene selection. This step was previously done when building the map.

  • Projection is robust to batch effects, like single-cell technologies or sequencing depth (for more information, please read ProjecTILs paper - Andreatta et al. 2021).

What happens if some cells are not covered by the reference?

If some cells are not covered by the reference, they should be filtered-out (eg. CD4 T cells if the reference is CD8 T cells).

By default, both Run.ProjecTILs() and ProjecTILs.classifier() have the parameter filter.cells set as TRUE. This means that cells out of reference will be filtered-out using the built-in scGate model. This model is stored in the slot misc of the reference Seurat object: ref@misc$scGate. You can custom this filtering by amending this slot using scGate grammar.

Human CD8 TIL reference

First, let’s have a look at the reference map.

# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")
## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette

# Compute DEGs
DefaultAssay(ref.cd8) <- "RNA"
ref.cd8 <- NormalizeData(ref.cd8)
markers <- FindAllMarkers(object = ref.cd8, only.pos = TRUE, assay = "RNA")

# Remove TCR genes
tcr.genes <- SignatuR::GetSignature(SignatuR$Hs$Compartments$TCR)
markers <- markers %>% filter(!gene %in% unname(tcr.genes))
markers %>% group_by(cluster) %>% top_n(n = 3, wt = avg_log2FC) -> top3

# DimPlot
DimPlot(ref.cd8,  group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)

Here are the different T cell subsets defined in the map:

  • CD8.NaiveLike: Antigen-naive T cells

  • CD8.CM: Central Memory T cells

  • CD8.EM: Effector Memory

  • CD8.TEMRA: Effector Memory cells re-expressing CD45RA. Sometimes called Short Lived Effectors (SLEC), or Cytotoxic effectors

  • CD8.TPEX: Progenitors exhausted T cells

  • CD8.TEX: Exhausted T cells

  • CD8.MAIT: MAIT cells, innate-like T cells defined by their semi-invariant αβ T cell receptor (TCR).

Let’s check Differentially Expressed Genes (DEGs) between each cluster, to confirm cluster marker genes.

# Plot heatmap
VlnPlot(ref.cd8, assay = "RNA", features  = top3$gene, cols = mycols, stack = T, flip = T, fill.by = "ident") + NoLegend()

Progenitors exhauted (Tpex)

Pioneering work in the murine lymphocytic choriomeningitis virus (LCMV) model has mapped the molecular and phenotypic profiles of CD8+ T cells, revealing progenitors of exhausted T cells (TPEX), defined by the expression of transcription factors, TOX and TCF1. These cells which arise in the acute-phase of infection and sustain terminally exhausted subsets over the long-term. As they are though to renew the pool of terminally exhausted cells, an increasing number of report show that this population is of primary importance for cancer immunotherapy. (Utzschneideret al., Siddiqui et al.).

Here are some markers to pinpoint them:

Positive markers: TCF7, CD200, CRTAM, GNG4, TOX, LEF1, CCR7, CXCL13, XCL1, XCL2

Negative markers: GZMB, NKG7, PRF1, HAVCR2, CCL5, GZMA

Tumor T cell differentiation model, going through a intermediate state of progenitor exhausted (Tpex). Figure from Andreatta et al. 2021.

Tpex importance is of growing interest, but except few studies (Oliveira et al., Magen et al., Zheng et al.), this subset have been harder to detect in human. Having a human CD8 reference with clearly annotated Tpex solves this issue.

Detecting Tpex in Gueguen et al. 2021

Setup data

#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)

Projection

Thanks to scGate filtering, only the CD8 clusters (upper part of UMAP) were mapped.

# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)
## 
##        CD8.CM        CD8.EM      CD8.MAIT CD8.NaiveLike     CD8.TEMRA 
##          2132          3181           147           238           276 
##       CD8.TEX      CD8.TPEX 
##          3340           337
DimPlot(gueguen.cd3, order = T,  label = T, repel = T) 

DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)

# Radar plots
p <- plot.states.radar(ref.cd8, query = gueguen.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T) 
wrap_plots(p) + theme_bw()

We can see that the previously homogeneous cluster CD8-LAYN seems to be in fact composed of two subsets: CD8.TEX and CD8.TPEX.

How to assess quality/robustness of mapping?

It can be hard to make the call between cells modified, and cells plainly wrongly mapped. We usually recommend to assess mapping consistency by checking consistency among top markers. If the query seems quite different from the reference, we recommend to understand DEGs between the reference and the query, for each cell type of interest.

Detecting Tpex in Yost et al. 2019

Setup data

# Load data
#download.file("https://figshare.com/ndownloader/files/39109277", destfile = "Yost.cd3.Rds")
Yost.cd3 <- readRDS("Yost.cd3.Rds")

# Normalize data
Yost.cd3 <- NormalizeData(Yost.cd3)
Yost.cd3 <- ScaleData(Yost.cd3)
## Centering and scaling data matrix
# DimPlots
DimPlot(Yost.cd3, reduction = 'umap', group.by = 'cluster', label = T)

DimPlot(Yost.cd3, reduction = 'umap', group.by = 'patient', label = T, repel = T)

Now we can have a look at the original study annotation, including clusters. We can see that activation cluster (**CD8_act**) is patient specific, as it seems driven mainly by patient `su008` after receiving immunotherapy treatment.

Projection

As this dataset is a mix between CD4 and CD8 T cells, we will keep the parameter filter.cells as TRUE to keep only CD8+ T cells.

DefaultAssay(Yost.cd3) <- "RNA"
Yost.cd3 <- ProjecTILs.classifier(Yost.cd3, ref = ref.cd8, filter.cells = T, split.by = 'patient', ncores = 6)
table(Yost.cd3$functional.cluster)
## 
##        CD8.CM        CD8.EM      CD8.MAIT CD8.NaiveLike     CD8.TEMRA 
##          3806          5296           316           438           965 
##       CD8.TEX      CD8.TPEX 
##          2370           499
DimPlot(Yost.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)

We indeed detect TPEX, next to TEX clusters, which make sense. Let’s check how the expression profiles look.

# Radar plots
p <- plot.states.radar(ref.cd8, query = Yost.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T) 
wrap_plots(p) + theme_bw()

We can see that in the Yost et al., CD8.TPEX can be found with profiles matching the reference, including on the CD8_act cluster

Now let’s focus on the CD8_act cluster only.

Yost.cd3.sub <- subset(Yost.cd3, subset = cluster == "CD8_act")
DimPlot(Yost.cd3.sub, group.by = "functional.cluster", cols = mycols, repel = T, label = T)
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

DefaultAssay(Yost.cd3.sub) <- 'RNA'
FeaturePlot(Yost.cd3.sub, features = c('IL7R','FGFBP2','GZMK'), ncol = 3, pt.size = 0.5, order = T, cols = pals::coolwarm()) & NoLegend()

We see that in original reduced space, within the activated cluster, we recover cell types from our CD8 reference, including CM, TEMRA and EM clusters (respectively high for IL7R, FGFBP2 and GZMK). If you are interested in recovering cell types hidden by transient cell states, you can read more in the corresponding tutorial.